Skip to content

Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377

Open
benalleng wants to merge 4 commits intopayjoin:masterfrom
benalleng:url-rewrite
Open

Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377
benalleng wants to merge 4 commits intopayjoin:masterfrom
benalleng:url-rewrite

Conversation

@benalleng
Copy link
Copy Markdown
Collaborator

@benalleng benalleng commented Mar 2, 2026

Closes #1124

This adds the internal Url Struct to no longer depend on the url crate directly for it.

However we still have reqwest inside payjoin url from payjoin-directory and test-utils which pull in the url dep, but that is only available on the io feature.

I have also added a fuzz target, which already helped me better shape the parse function better.

I have only ran the fuzzer for a limited amount of time however and some more cpu hours might be helpful

Claude really helped me get through the brunt of this.

You can test the lack of an accessible url dep not including the io feature with this command.

cargo tree -p payjoin --no-default-features --features v1,v2,directory,_manual-tls,_test-utils -e no-dev,no-build -i url
Pull Request Checklist

Please confirm the following before requesting review:

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Mar 2, 2026

Coverage Report for CI Build 24522598976

Coverage increased (+0.4%) to 84.741%

Details

  • Coverage increased (+0.4%) from the base build.
  • Patch coverage: 36 uncovered changes across 8 files (513 of 549 lines covered, 93.44%).
  • 3 coverage regressions across 2 files.

Uncovered Changes

File Changed Covered %
payjoin-cli/src/app/v1.rs 19 1 5.26%
payjoin/src/core/url.rs 463 453 97.84%
payjoin-cli/src/app/config.rs 6 4 66.67%
payjoin/src/core/io.rs 3 1 33.33%
payjoin-cli/src/app/v2/ohttp.rs 6 5 83.33%
payjoin/src/core/ohttp.rs 3 2 66.67%
payjoin/src/core/receive/error.rs 1 0 0.0%
payjoin/src/core/receive/optional_parameters.rs 25 24 96.0%

Coverage Regressions

3 previously-covered lines in 2 files lost coverage.

File Lines Losing Coverage Coverage
payjoin/src/core/into_url.rs 2 94.74%
payjoin-cli/src/app/v1.rs 1 64.71%

Coverage Stats

Coverage Status
Relevant Lines: 13271
Covered Lines: 11246
Line Coverage: 84.74%
Coverage Strength: 403.4 hits per line

💛 - Coveralls

@benalleng benalleng changed the title Url rewrite Remove url dep from payjoin crate* Mar 2, 2026
@benalleng benalleng changed the title Remove url dep from payjoin crate* Remove url dep from payjoin crate Mar 2, 2026
@benalleng benalleng changed the title Remove url dep from payjoin crate Use internal Url struct in favor of url::Url to minimize the url dep in payjoin Mar 2, 2026
@benalleng benalleng marked this pull request as draft March 2, 2026 23:31
@benalleng

This comment was marked as resolved.

@benalleng benalleng force-pushed the url-rewrite branch 2 times, most recently from 7c97b6c to 8e51345 Compare March 27, 2026 15:10
@benalleng benalleng marked this pull request as ready for review March 27, 2026 15:42
Copy link
Copy Markdown
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can edit the spec to say "the host component of any URL in a BIP77 message MUST be an ASCII LDH (Letters, Digits, Hyphens) plus dots: [a-zA-Z0-9.-] hostname or an IPv4 address literal" to kill IDNA once and for all. For the full URL, we restrict the character set to RFC 3986 unreserved plus the structural delimiters we actually need (:/, ?, #, @). Reject any byte outside printable ASCII (0x21–0x7E) that isn't percent-encoded.

There's a specific URL fuzzer we might consider here: https://github.com/orangetw/Tiny-URL-Fuzzer https://github.com/orangetw/Tiny-URL-Fuzzer/blob/master/samples.txt could also test against

https://payjo.in\@evil.com/path
https://payjo.in%40evil.com/path
https://payjo.in#\@evil.com
https://payjo.in/../evil.com
https://payjo.in/%2e%2e/evil.com
https://evil.com:443@payjo.in/
https://payjo.in%00.evil.com/path
https://payjo.in:@evil.com/

not that these are reall attacks but just so we knwo we parse the same as url for stuff we could actually encounter. Low prio but I wanted to document.

Comment thread payjoin/src/core/url.rs Outdated
Comment thread payjoin/src/core/url.rs Outdated
}

#[derive(Debug, Clone, PartialEq, Eq)]
#[allow(dead_code)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self that this is removed by the time the PR's last commit comes around.

Copy link
Copy Markdown
Collaborator Author

@benalleng benalleng Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want support for ipv4 or ipv6 hosts in payjoin::Url if not we can just keep Host::Domain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for testing especially this is still useful. What do you think?

Comment thread payjoin/src/core/receive/v2/mod.rs Outdated
Comment thread payjoin-cli/Cargo.toml Outdated
@DanGould
Copy link
Copy Markdown
Contributor

Tnull says "Def. no huge blocker if there is a path to dropping it!" but this seems pretty darn close so I'd like to close it out to keep the prs flowing

@benalleng benalleng force-pushed the url-rewrite branch 12 times, most recently from 652f4a7 to 4fca182 Compare April 14, 2026 20:15
@benalleng benalleng force-pushed the url-rewrite branch 3 times, most recently from 96f3353 to b3bcf75 Compare April 15, 2026 13:42
This commit migrates the monorepo away from the external Url dep to use
the new internal Url. Additionally due to the transition we need to add
a dep for url encoding with `percent-encoding-rfc3986` which
coincidentally get us inline with the bitcoin_uri crate.
Copy link
Copy Markdown
Contributor

@xstoicunicornx xstoicunicornx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a bit of feedback.

One other thing, seems like we are replicating the url::form_urlencoded::parse method in two different places so maybe its worth creating a native implementation of this in url.rs?

Comment thread payjoin/src/core/url.rs

pub fn query(&self) -> Option<&str> { self.query.as_deref() }

pub fn set_query(&mut self, query: Option<&str>) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a public fn it feels a little weird to have no validation done. Maybe just have a clear_query to easily set query to None and otherwise force usage of query_pairs_mut? Or reduce visibility to pub(crate)?

Comment thread payjoin/src/core/url.rs

pub fn join(&self, segment: &str) -> Result<Url, ParseError> {
// If the segment is a full URL (scheme://...), parse it independently.
// Only treat it as a full URL if :// appears before any / (i.e. in scheme position).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this comment was confusing to me when I originally read it

Suggested change
// Only treat it as a full URL if :// appears before any / (i.e. in scheme position).
// Only treat it as a full URL if no / appears before :// (i.e. in scheme position).

Comment thread payjoin/src/core/url.rs
scheme.push(c);
}
':' => break,
_ => return Err(ParseError::InvalidCharacter),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use ParseError::InvalidScheme instead, as this is more consistent with how parse_port errors are handled

Comment thread payjoin/src/core/url.rs
Comment on lines +438 to +457
let mut path = String::new();
let mut query: Option<String> = None;
let mut fragment: Option<String> = None;

if let Some(frag_pos) = input.find('#') {
let before_fragment = &input[..frag_pos];
fragment = Some(input[frag_pos + 1..].to_string());

if let Some(q_pos) = before_fragment.find('?') {
path.push_str(&before_fragment[..q_pos]);
query = Some(before_fragment[q_pos + 1..].to_string());
} else {
path.push_str(before_fragment);
}
} else if let Some(q_pos) = input.find('?') {
path.push_str(&input[..q_pos]);
query = Some(input[q_pos + 1..].to_string());
} else {
path.push_str(input);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to follow why not just:

Suggested change
let mut path = String::new();
let mut query: Option<String> = None;
let mut fragment: Option<String> = None;
if let Some(frag_pos) = input.find('#') {
let before_fragment = &input[..frag_pos];
fragment = Some(input[frag_pos + 1..].to_string());
if let Some(q_pos) = before_fragment.find('?') {
path.push_str(&before_fragment[..q_pos]);
query = Some(before_fragment[q_pos + 1..].to_string());
} else {
path.push_str(before_fragment);
}
} else if let Some(q_pos) = input.find('?') {
path.push_str(&input[..q_pos]);
query = Some(input[q_pos + 1..].to_string());
} else {
path.push_str(input);
}
let (before_fragment, fragment) = match input.find('#') {
Some(pos) => (&input[..pos], Some(input[pos + 1..].to_string())),
None => (&input[..], None),
};
let (path, query) = match before_fragment.find('?') {
Some(pos) =>
(before_fragment[..pos].to_string(), Some(before_fragment[pos + 1..].to_string())),
None => (before_fragment.to_string(), None),
};

Comment thread payjoin/src/core/url.rs
Comment on lines +127 to +130
Some(ref h) if h.is_empty() => return Err(ParseError::EmptyHost),
Some(Host::Domain(ref d))
if !d.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '.') =>
return Err(ParseError::InvalidHost),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense for these parsing errors to be caught in parse_host (would also be more consistent with the other parsing functions)?

Also, if we are erroring for ParseError::EmptyHost why does the host need to be an Option<Host> rather than just Host? This validation seems to ensure that the host will always exist. Not using an Option<Host> would also eliminate the need for has_host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove url crate dep from payjoin

4 participants